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Algorithms definei as recursive functions, such as 
in "pure" LISP, are shown to have structure sufficient 
ZO distinguish between processes which must be 
executed in sequence and processes which may be 


executed in parallel. 


An interpreter program is presented for executing 
LISP programs and simultaneously computing the number 
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in order to achieve optimum parallel processing. 
Sample program runs are presented to show speed-up 
ratios between strictly sequential and optimally 


parallel executions. 


A possible hardware organization for a parallel 
processing system derived from the interpreter progran 


1s presented. 
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Both the growing volume of buSiness data processing and 
the increasing complexity of problems emerging from 
scientific and military research are demanding greater 
efficiency? fron computer systems. iireretLfiecrency 
attainable from the latest versions of tredrelonal, 
sequential machines appears to be approaching limits set by 
the final speed of electromagnetic wave propagation, while 
more costly, non-traditional, parailel machines have been 


encumbered by software complexities. 


Because much of the processing done by sequential 
machines consists of independent sub-processes, it has’ long 
been cecognized that efficiency could be improved if these 
independent sub-processes could be performed simultaneously. 
Considerable progress has been made in this area by building 
more sophisticated systems from baSic Von Neumann machines. 
Multiprocessor systems enable multiple processors to share a 
common core Memory while Simultaneoulsy processing 
independent programs. A distributed system represents a 
network of computers, each with itS own memory, working 
together on independent (or nearly independent) programs in 


order to solve a common problen. 


But Paiva diurad programs themselves may contain 
independent segments which could be executed in parallel. 
Machines which have been designed to perform parallel 


executions of Single programs represent Signa fticant 


Se ee ee eS ee ee eee 


1 AS used herein, efficiency means a measure of 
throughput or execution speed. 





departures in organization from conventional machines. 
Thurber and Wald [Ref. 1] provide a Survey of parallel 


processors and their organizations. 


There is a reason for the unconventional architectures 
of parallel processors. Single programs may be thought of 
as mathematical functions which map a single input data set 
to a Single output data set. This means that the results of 
parallel computations must eventually be brought together to 
BEeoduce the final output. Hence, the actions of the 
components of a parallel processor must be more tightly 


coordinated than in a multiprocessor or distributed system. 


There are several reasons why parallel machines have not 
achieved widespread use. Certainly one reason is that many 
users remain satisfied with sequential machines as long as 
they continue to meet their efficiency requirements. 
Another reason is that the parallel machines developed so 
far require an additional degree of software complexity in 
order to distinguish between parallel and seguential tasks. 
Hardware cost 1s another reason. In the past, the cost of 
logic elements was much greater than the cost of memory 
elements. Memories were designed to be accessed through a 
Single port. These factors were in line with the Von 
Neumann principles of sequential, centralized control of 


computations and linearly organized memory. 


But now, advances in technology are making it possible 
to define a new set of principles for computer design. 
Glushkov, ets fad...) ieRer . 2] present a set of five 
principles, guite different from the Von Neumann principles, 
for the design of what they call recursive machines. The 
advances already made by LSt technology make the 
possibilities seem endless. Manufacturers are currently 
producing single-chip computers (memory and CPU on one 


Shap). It is not inconceivable to imagine an array of 





bipolar processors on a single chip. Memory technologies 
are improving too, making it possible to access data faster 


and in parallel. 


A. DEVELOPING A PARALLEL SYSTEM 


In the past the development of parallel processor 
Systems has been characterized by the development of the 
hardware organization 1 ees followed by efforts to 
implement compatible software. As an example, the ILLIAC IV 
computer Phen. 33] was designed to capitalize on the 
parallelism inherent in problems where the data is naturally 
Bemuctured “in array form. The processing elements, each 
with 2K of memory, are organized into four 8 x 8 arrays. 
Kuck (Ref. 4] discusses the programming language 
Tranquility which was designed eee the ins IV. 


Deangurlity is an algol-like language which provides the 


programmer with sequential and Simultaneous CONTEOL 
statements. 
Ramamoorthy and Gonzales fret. 5 ] suggest two 


approaches to the problem of recognizing program tasks which 
Can be executed in parallel. TReSNELESt approach iS to 
provide the programmer with tools, like Tranguility, which 
enable hin to explicitly indicate tasks which can be 
processed aor parallel. The second approach involves 
preprocessing the source program to analyze ee 
relationships between tasks and thus determine what parailel 
processing is possible. Lamport [Ref. 6] presents two 
methods for enabling parallel execution of Fortran DO loops. 
Keller [ Ref. 7] discusses methods whereby processors can 
"look-ahead" to a limited number of sequentially organized 
instructions to find instructions that can be executed 


"out-of-order" without affecting the final outcome. 
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Research into methods of recognizing independent program 
segments, and hence parallelisa, within sequential 
algorithms, seems worthwhile since it may permit established 
program libraries to be efficiently utilized on future 
parallel processors. Stone [{Ref. 8] however, points out 
that efficient algorithms designed for parallel execution 
May prove to be quite different from their serial 


counterparts. 


Based on the work done in developing parallel processing 
Maseets SO nar, and Om the recent and predicted advances in 
LSI technology, a reasonable way to implement a general 
purpose parallel processing system is to first develop a 
software system for describing the parallel execution of 
computer algorithms and then to organize the hardware so as 
to physically implement the software systen. This thesis 
considers such a software system and suggests an approach to 


the hardware organization. 


Bee SOFTWARE FOR A PARALLEL SYSTEM 


The software system considered is a subset of an 
existing language, LSP, whose syntax allows easy 
recognition of parallel tasks within a program. [In "pure! 
LISP, programs are defined as recursive functions of 
conditional expressions which act on ordered sets of input 
data. When evaluating algorithms which are described 
mumecvtonally, as in “pure LISP,2 the procedure is to 
first evaluate the arguments and then to apply the function. 
It is this simple procedure which differentiates between 


what can be done in parallel and what must be done in 


SS ee ee ec es ee ee ee ee ee 


_ ¢ Henceforth, the term LISP will be used to mean "pure" 
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sequence. HitwumelsMaromnN oa y, the arguments to a function 
represent processes which can be executed in parallel, while 
the composition of functions represent processes which must 


be executed sequentially. 


In order to "reveal" the parallelism inherent in a LISP 
program and to show that nL ae is recognizable at 
execution-time,° the LISP function evalquote2 has been 
developed. Section III describes evalquote2 in detail. 
Evalquote2 is similar to the universal function evalguote. 
Evalguote is called a universal function. (or interpreter) 
because it can compute the result of any LISP function 
applied to its arguments if the result is defined. 
Evalgquote2 also computes the result of any LISP function 
applied to its arguments, and additionally, it monitors the 
data flow graph which describes graphically the sequential 
and parallel relationships between executing LISP 
primitives. The output from evalquote2 includes a list of 
integers representing the number of separate proc2ssors 
required to optimize parallel processing at each stage of 


execution. 


Pimencenr tO pOsteilate the effect of running non-trivial, 
LISP programs in a hypothetical parallel processing 
environment, evalquote2 is implemented by a 
LISP-metalanguage translator and interpreter writt?n in 
pEgol—W . This aAlgol-W program will henceforth be referred 
to as the interpreter. Section IV explains the interpreter 
and the results it has obtained from processing several 


Sample LISP programs. 


eee MEASURING PARALLELISM 


When proposing a parallel processing system it is 


Z 





necessary to provide some measure Or the ex pected 
improvement in efficiency. Stone [8] uses the speed-up 
ratio which is defined for a given algorithm as the ratio of 
the execution time for the best serial version of the 
algorithm to the execution time for the best parallel 
version of the algorithn. The interpreter provides a 
Similar measure of efficiency improvement for the sample 
LISP programs evaluated. In this case, the speed-up ratio 
is the ratio of the number of execution stevos required for a 
sequential execution to the number of stages required for a 


parallel execution. 


The sample programs analyzed by the interpreter were not 
chosen because they generate particularly large speed-up 
ratios. Rather, they were chosen as "typical" programs 
offering a reasonable blend of conditional expressions, 
functional composition, and recursion. The speed-up ratios 
computed for these programs provide a very limited view of 
the improved efficiency possible with a general purpose 
parallel processing system. Pines iV, the “most  wadely 
known of the existing parailel processors, is call an array 
processor because it was developed to process a class of 
algorithms for which the speed-up ratios are enormous. 
Matrix multiplication is an example of an operation for 
which large speed-ups are possible, and it is included among 
the sample programs. In order to gain some inSight into the 
results expected from the sample programs, the remaining 
paragraphs of this section will discuss’ the speed-ups 


possible in matrix multiplication and Summation algorithms. 


eee LAoE EXAMPLE OF “MATRIX HULTIPLICATION 


Mileiplacation sot two n x n matrices requires n 


18s: 





2 
multiplications and n (n-1) additions. When performed 


3 
sequentially, this process requires Dee eenerGn— 1) (or 
S 
2n - n) steps, where a step is one addition ofr one 
2 
Mics plication, Observe, however, that all of the n 


multiplications are independent of one another and could be 
* * ’ « 3 e 
done in one step Consisting of n Simultaneous 


3 
micci plications (assuming there were n multipliers 


availabls). 


Z Vs 
The n (n-1) additions represent the n Summations, each 


of n products from the multiplications, which will produce 
Z 

the a elements of the product matrix. Certainly the n 

Summations are independent of one another and hence could be 


performed in parallel. 


Now consider the summation of n elements. Sue hea 
Summation requires n-1 additions. Because addition is 
associative, the order in which the n-1 additions are 
preformed will not affect the outcome. Because addition is 
a binary operation, the summation process can be started by 
Simultaneously adding pay Ziban ss Oe addends, 2 The 
Summation process can then be reapplied a ivemanney 2) 
remaining elements. This procedure will still requir? a 


total of n-1 additions, but only oge mi steps are required 


ge wc ce i ee ee 


3 For a real number x, 
[x] denotes an integer such that x < [x] < x+1, and 


ixj denotes an integer such that x-1 < [LxJ < x. 
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(assuming a minimum of [n/2} adders are available for the 


first step). 


Hence, the total process of matrix multiplication of two 
nx n matrices could be performed in 1 + [tog nl steps. Of 


3) 
course, such a parallel computation would reyguire n 


2 
multipliers for step 1, n ([n/2j) adders for step 2, and 


approximately half as many adders for each successive step. 


Consider the multiplication of two 8 x 38 matrices. If done 
sequentially, this process would require 9360 steps. If done 
with optimum paralleling, this process would require only 4 
steps! Hence, a speed-up ratio of 240 could be achieved by 
512 parallel multipliers and 256 parallel adders. A LISP 
program which multiplies two 4x 4 matrices is included 
among the sample programs and will be discussed in Section 
EY. 


Peele hs D-UP FOR ASSOCIATIVE PROCESSES 


There is a general result for the speed-up possible ina 
process composed of assoclative sub-processes such as the 
Summation process just discussed. To develop this result it 


1s necessary to define some terms. 


As used herein, the tern "primitive! (or "primitive 
process") refers to a member of the set of operations that 
can be performed by a processor. A processor is an agent 
(human or machine) which can carry out ae process. The 

\ 


process may be just a primitive, or it may be a composition 
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of primitives. In the example of multiplying two matrices, 
the process of matrix multiplication was composed of the 
primitives for scalar multiplication and addition. For some 
processors, multiplication iS a process composed of the 


primitives shift and add. 


The operands for primitives may be referred to as data 
elements. A primitive may be unary, binary, or .n-ary, 
meaning that it processes one, two, or n data elements in 
one processing step. The physical action implied by the 
terms primitive process, process, and data elements can be 
described abstractly by the terms ier all Lune Loc, 


functions, and set elements. 


The following general formula represents the speed-up 
ratio which can be achieved by an ideal parallel processing 
system* for a general process composed of associative, 


N-ary, primitives acting on a set of N data elements. 


[(N-1) 7 (n-1) | 
Speed-up ratio = ——— —_——__—— ; n2 2 
Llog NJ 
N 





The numerator represents the number of steps required for a 
sequential execution. Each sequential step would reduce the 
number of data elements remaining by n-1 until n or fewer 
elements remained. The final step would reduce the number 
of slements to one. The denominator represents the number 
of steps required for a parallel execution. At each step, 
Successive n-tuples would be operated upon in parallel. 
Brent [Ref. 9] provides an analysis of parallel execution 


times possible for arithmetic expressions in general. 


FPF. COMPARISON OF SUMMATION ALGORITHMS 


—_— SS a ese eee eee ee 


4 An ideal parallel SEOE GSS ae system is_one that has 
all the primitive frocessors it will éver need. 
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Because summation iS a process composed of binary, 


associative additions, a speed-up ratio of (N-1) / feces 
Can be achieved when summing N data elements. Three 


Summation algorithms, one serial and two parallel, are 


Gonsidered,. 


The following program segment represents a typical 
FORTRAN subroutine for summing a vector of integers. 
PUNGTEON SUM (iNtGRS, N) 
DIMENSION INTGRS (N) 
ISUM = 0 
DO %t1I= 1, N 
1 PSU Cb Suma + INIG RS () 
RETURN 
This serial algorithm actually requires N steps. Since the 
FORTRAN DO loop will be processed at least once, it is 
necessary tc allow for the case where the vector contains 


- 


only one integer. 


The following segment from Ref. 4&4 is a parallel 
algorithm for the summation process written in TRANQUILITY. 


BEGIN INTEGER ARRAY A[O: ‘ar INTEGER I,J,K,; 
FOR sh 7350 (Oe se 


Ks 
Sahee Seto). 
Ni nee "+ Af (I+d) MOD 256] 


me 


Ds 
This seqment is designed to use 256 processor elements to 
Sum exactly 256 elements. If the input data set has less 
mnan 256 elements, the remaining elements of the array are 
given zero values. If there are more than 256 elements the 
extras are folded across the 256 processing elsment 
memories, and each processor performs a serial summation 
before the above segment is invoked. Referring to tne 
result developed in sub-section D, the number of steps 


required for the parallel summation of 256 elements is 


VW 





> or 8. The outer FOR loop represents this sequence 


ie OmmOsL 8G (0,.2.,/7) steps. The inner FOR loop causes the 
Simutaneous execution (SIM) of the ‘'+' primitive by each 
Precessor (0,...;255). This inner loop will be executed a 
total of 8 times after which each processing element will 


contain the final sun. 


The last Summation algorithm presented here is in the 
syntax of the LISP metalanguage. 


pet haat cabrones Ti" 


reducefa] = {| nullfa] ~ NIL; null{cdrf{a]]J ~— a; 
T- cons{ add{ carfa seated a.) |: 
reduce({ cadrf{a ]}] 


The variable 'a' represents a list of integers to be 
Suomed. If the list contains two or more integers, the sun 
function calls on the reduce function. The reduce function 
adds successive integer pairs and returns a reduced list of 
integers. The sum function is then applied to the reduced 
ies t This process continues until the list contains only 
one integer which is the final sun. By computing these 
arguments in parallel the above algorithm will generate the 


sum of N elements in flog wl addeetvon steps. 


Section II provides some background information on data 
flow graphs and LISP programs. This information is 
necessary for understanding the development of evalgquote?2 
discussed in Section III. Section V proposes a "skeletal" 
hardware SEganiZat ion 146 implementing the parallel 


execution described by evalgquote2. 
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lime BACK GROUND 


This section contains background information necessary 
for understanding the development of Evalquote2 in Section 
III and the test programs discussed in Section IV. Included 
is a discussion of LISP concepts and a modified version of 
the LISP metalanguage syntax which 1s used for the programs 
in this thesis. Data flow graphs are explained as a tool 
for recognizing parallelism, and the g-vector is introduced 


as a notational device for describing data flow graphs. 


A. THE LISP LANGUAGES 


ihe LISP Language is best described by Section I of Ref. 
10. An overview of the language will be provided here. 
Appendix A contains the language syntax for the programs 
used in this thesis. Because these programs were run on a 
S/360 using EBCDIC characters, the notation differs slightly 
meom the notation published in Ref. 190. 


A LISP program is a LISP function and an argument list 
whose elements are S-expressions (symbolic expressions). An 
S-expression can be one symbol called an atom, or it can he 
an ordered list of S-expressions which is usually delimited 
by parentheses. There are three primitive functions used to 
Manipulate S-expressions. The CAR function gives the first 
element within an S-expression. The CDR function gives the 
S-expression remaining after removal of the first element. 
The CONS functions takes two S-expressions and produces a 


new S-expression by inserting the first S-expression as the 


1 





first element within the second S-expression. For example, 
iR<a (Ar S Gyoe gives A, CDR<(A BC)> ' gives 'ecigen er and 
CONS<A;3 (B C)> gives (A BC). An "enpty Sela st, Cie 2S 
equivalent to the atom NIL. 


There are two primitive functions which are predicates. 
EQ gives the atom Tif its two arguments represent the same 
atom, or F otherwise. ATOM gives the atom T if its argument 


is an atom, or F otherwise. 


Miemeverouon “Of LISP used in this thesis includes the 
primitives ADD and MUL which give the sum and product, 


respectively, of two atoms which are non-negative integers. 


By allowing the operations of composition and recursion, 
the class of functions definable in terms of the primitive 
functions can be expanded to the set of partial recursive 
functions over the domain of S-expressions. McCarthy [11] 
gives a formal development of the class of functions 
computable in terms of given base functions. Any function 
belonging to the class of computable LISP functions can be 
described with the use of LAMBDA and LABEL notations and 


conditional forms. 


These notations will be explained in terms of the syntax 
of Appendix A. Non-terminal symbols are in lower case 
letters. The LABEL notation looks like 

@<cN: LUNCtE1LOn>, 
wnere FN is the name assigned to the function. The LABEL 
notation allows the programmer ee) define recursive 
functions. The LAMBDA notation looks like 

ene AND t 6LOrMS:, 
where X1 through XN are dummy variables used within the form 
Molen derines the function. When a LAMBDA function is 
applied to a set of S-expressions, the dummy variables are 


assigned the values of the corresponding S-expressions. 
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There are four possibilities for a form. It may be 
merely a constant or a variable. Or St eemay De Hanethe: 
mEmGCeLON WEE h tS OWhn argument list. Or lastly, it may be a 
Somali tiOnal rorm. In the syntax of Appendix A, conditional 
forms appear as a list of predicate-expression pairs. 
Conditional forms are evaluated by evaluating successive 
predicates until one of them evaluates to T. The value of 
the corresponding expression then becomes the value of the 


Sreane conditional forn. 


A more general notation for a conditional form is (p <a 
c,a), where pis the premise, c is the conclusion, and a is 
the alternative. The premise is a propositional form which 
evaluates to a truth value. The value of the premise 
determines whether the conclusion or the alternative will 
give the value of the form. The conclusion and the 
alternative may themselves be conditional forms. reference 
11 ancludes a detailed discussion of the formal properties 


@resconditional forms. 


B. FUNCTIONALS AND PROGRAM ORGANIZATION 


As defined by the syntax in Appendix A, an argument can 
Bewd £LOrm Or a Lunction. Functions which accept functions 
as arguments are called functionals. The best known 
Manctional in LISP is evalquote. Evalgquote takes as 
arguments any LISP function and its argument list and gives 
the result of the function applied to its arguments, i. e., 

EVALQUOTE<function; (argument...argument) > 
is equivalent to 
DIMcrLOnCabgument sa... s,argiument>. 
Reference 10 describes evalquote. Appendix D contains a 


program ijisting of evalquote in the syntax of Appendix A. 
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The concept of functionals has powerful implications. 
One of the features of functionals 1s that they enable the 
programmer to achieve the same economy of expression and 
memory space that is achieved in other programming languages 
through the use of subroutines. A function can be defined 
once aS an argument to a LAMBDA function. The function 
definition is then paired with a variable and can be used 


repeatedly in the defining form of the LAMBDA function. 


The sample programs presented in this thesis make use of 
functionals in this way. The general organization of these 
programs is given below. Comments are enclosed in Single 


quotes. Non-terminal symbols are in lower case letters. 


Sac VAR Ie... > VARND ;: ‘program variables! 
Pacis. oe Me ‘function names! 
form> ‘defining form for the program 


in terms of V1 through VN 
and Fi through Fi! 
Gynec ilne <1 Ol: Prunce2on derinitions' 
YEZUerine 1 On: 


Patt Lunet1on>> 'end of program function' 
<3 Cris. ss 7S ee D NS 'program argument list! 
= 'SOr symbol ' 


C. DATA FLOW GRAPH 


A data flow graph is a graphical description of the 
execution of a specific program. The entities depicted by a 


Gata flow graph are data elements and primitive processes. 


POr the sample data flow graphs of this section, the 


nodes are labeled with primitive processes and the edges are 
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labeled with data elements. Figure 1 is the data flow graph 
for the FORTRAN function SUM (from Section I) where the data 
momen integer vector Of eight 1's. Only the primitive '+! 
is used in the graph. Figure 2 is the data flow graph for 
mice Lio £uUnction SUM {also from Section 1) applied to a 
mrst Of eight i’s. Again only the primitive 'ADD' is used 
in the graph. For a specific program execution the data 
flow graph illustrates the execution order among the 
primitive processes, specifically describing which processcs 
can be performed simultaneously and which must be performed 


in sequence. 


In graph theory, an edge is usually defined by the two 
nodes to which it is connected. There are edges in the 
graphs of figures 1 and 2 which are connected to only one 
node. This is only a superficial discrepency which can be 
corrected by viewing the data elements as nodes and the 
primitive processes as edges. Figure 3 shows this dual forn 
men, the gtaph of figure 2. In the dual form, a binary 
primitive process is represented by two edges. 

In graph theory a path is defined as any sequence of 


edges in which each successive edge originates from the 


terminal node of the preceding edge. A data flow graph is 
an acyclic directed graph. Tne term "directed" means that a 
direction is associated with each edge. "Acyclic" implies 


that no 2dges are repeated in any path. The length of a 
path is defined as the number of edges in the path. The 
length of the longest path in the dual form of a data flow 
graph represents the number of execution stages which would 


be required in a paraliel execution. 


The width of a data flow graph at a particular stage of 
execution is defined herein as the number of primitive 
processes to be executed at that stage. Hence, the maximun 


Width represents the minimum number of processors required 
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Figure 1. Graph of Fortran Sum 
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fom Opelmum parallel execution. 


Data flow graphs should not be confused with progran 
graphs. Pregram graphs represent abstractions of flow 
charts and are useful in the analysis of algorithms. They 
highlight the flow of control without regard to a particular 


data set. Program graphs are usually cyclic digraphs. 


PAIRLIS is a LISP function which pairs together elements 
of two S-expressions and appends the resulting list of pairs 
to an existing list. PAIRLIS is used by evalquote to pair 
Variables with their corresponding S-expression vaiues and 
store these pairs on the association list. Program 1 of 
mependix D is PAIRLIS<(A B); (1 2);((C.3))>. Figure 4 shows 
the data flow graph depicting the execution of this progran. 
This graph includes all the primitives used in PAIRLIS and 
illustrates the order in which they are executed. Those 
primitives which line up vertically may be executed in 
parallel. Otherwise, the primitives are executed in order 
meom left to right. 


De G-~VECTOR 


iheeeoge=veCtOn 84 Graph vector) 1s a list of integers which 

describes a data flow graph which in turn describes a 

program execution. The general form of a g-vector is 

oe a eo. W). The number of elements in the vector, n, 
n 

represents the maximum path length of the data flow graph 


which is the minimum number of steps (or stages) required 


for the entire computation. Each element, w , cf the 
a 


g-vector represents the width of the data flow graph (or the 
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Minimum number of machine level ovrocessors required for 
optimum paralleling) at step i in the computation. The sun 
of the elements in the g-vector represents the total number 


of primitive processes performed in the computation. 


Each data element in the data flow graph describing the 
execution of a LISP program has associated with it a 
SV eCtoOr. The G=Vector describes a sub-graph that 
represents the computations performed to produce the data 
element. The g-~vector may be empty as is the case with 


constants. 


There are two binary operations which can be performed 
On a pair of g-vectors. A g-vector may be appended to 
another g-vector to produce a longer resultant g-vector. 
The resultant g-vector represents the data flow graph for 
two processes which were performed in seguence. A g-vector 
may be combined with another g-vector by summing their 
corresponding elements. The resultant "wider"  g-vector 
represents the data flow graph for two processes which were 
performed in parallel. The g-vector for the data flow graph 
Or tigure 4 is (142421 1). 
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Evalgquote2 is a LISP function Similar to Evalquote. For 
input, evalguote2 takes two S-expressions. The first 
S-expression represents any LISP function, and the second 
S-expression represents a valid argument list for that 
munetion. Nowe alt “ESP programs, the output fron 
evalquote2 is a single S-expression. The CAR of this 
S-expression represents the result of the input function 
applied to the input argument list. The CDR of the output 
S-expression is the g-vector (a list of integers) describing 
the data flow graph resulting from the application of the 
input function to the input argument list. Appendix B is a 
listing of evalquote2 applied to the PAIRLIS function of 


Figure 4. 


The sub-functions used to define evalquote2 are Similar 
to the sub-functions used to define evalguote along with 
some additional functions used to compute the g-veéector. 
These sub-functions will be discussed shortly, but first 
will be a discussion of the logic used by evalquote2 to 


compute the g-vector. 


A. LOGICAL DEVELOPMENT 


The arguments for a function are independent of one 
another and may be evaluated Simultaneously (in parallel). 
The evaluation of each argument produces a resultant 
BeexPression and g-vector. When an argument is a function 


With its own argument list, the g-vector associated with the 
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resultant data element describes the data flow graph 
feeernined. by the application of the function to its 


argument list. 


When all of the arguments have been evaluated, thsir 
associated g-vectors are combined to produce ae single 
g-vector. The g-vectors are combined by summing their 
corresponding elements. The resultant g-vector describes 
the data flow graph which describes the parallel evaluation 
of the arguments. The length of the resultant g-vector will 
be equal to the length of the longest g-vector created in 
the evaluation of the argument list. Each element of the 
resultant g-vector represents the number of primitive 
functions that were executed at that stage in the parallel 


evaluation of the argument list. 


When all the arguments have been evaluated, the function 
Can be applied. The application of the function to the 
evaluated argument list is described by a new data flow 
Graph. The g-vector for this araph is appended to the 
g-vector for the combined argument list to produce a longer 
g-vector. This longer g-vector describes the total data 
flow graph which represents the evaluation of both tne 


function and its argument list. 


When the defining form for a function is a conditional, 
the g-vector must be computed in away which describes the 
evaluation cf a ccnditional form. As discussed previously, 
Saeegeneral form for a conditional is (p-c,a). The 
possibility of parallel evaluation of p, c, and a will be 
discussed later. Normally p (the predicate) is evaluated 
first and then c (the conclusion) or a (the alternative) is 
evaluated next depending on the value of op. Hence, the 
Sevector fOr p is computed first, and then the g-vector for 
c or ais appended. The resultant g-vector describes the 


sequential evaluation that has occurred. 


Pes, 





pee OUB- FUNCTIONS 


In order to more easily understand the following 
explanaticns of the sub-functions used in evalquote2, it may 


be helpful to scan Appendix B before proceeding. 


Apply2 computes the g-vector describing the application 
of a function to its arguments. The parameters for apply2 
are similar to those for apply except that the second 
parameter represents both the argument list and the combined 
g-vector descr long the parallel evaluation of the 
arguments. Notice that when apply2 is first called by 
evalquote2, the g-vector for the argument list 1S empty. 
This is because the initial arguments are all S-expressions 
and need no evaluation. iio tne £UNACELON 1S a DEIMitive, 
then the g-vector describing the application is (1). 
Therefore, (1) is appended to the existing g-vector and this 


new g-vector is associated with the resultant data element. 


If the function is a lambda or label expression, or a 
previously defined function, then the defining form will be 
evaluated by eval2. Eval2 will return a data element 
associated with a g-vector describing the evaluation. The 
suk-function compose is then used to append the g-vector 
returned by eval2 with the g-vector that came with the 


argument list. 


Eval2 is similar to eval in that it evaluates forms. 
The difference is that eval2 associates a g-vector with an 
S-expression (resultant data element) for each evaluation. 
If the form is a variable or a constant the g-vector is 


empty. If the form is a conditional then evcon2 is called. 
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Evcon2 is similar to evcon except that it also returns a 
g-vector describing the evaluation of the conditional. 
Evcon2 evaluates the first predicate and calls on graphcon. 
If the predicate is true, graphcon evaluates the 
corresponding expression and calls compose. Compose appends 
the g-vector for the expression to the g-vector for the 
predicate and associates the resultant g-vector with the 
resultant data element. If the predicate is false, graphcon 
calls compose with the result of evcon2 applied to the 
remainder of the conditional and the g-vector of the first 


predicate. 


If the form given to eval2 is a function with its 
argument list, the argument list is given to evlis2 for 
evaluation. Evlis2 is similar to evlis in that it evaluates 
arguments, but it also combines the g-vectors of evaluated 
arguments to produce a resultant g-vector describing the 


parallel evaluation of the arguments. 


The sub-function compose is used to compute g-vectors 
mesulting from the composition of functions. Composition of 
functions describes computational processes which must 
naturally occur in sequence. Compose is called from apply2 
and graphcon. Append is a standard LISP function used to 
create a néw list of the top level elements of two input 
lists. Append is called from apply2 and compose. Combine 
is used to compute g-vectors representing the parallel 
evaluation of arguments. Combine is called from evlis2. 
Sum is used by combine for adding corresponding elements of 
two g-vectors. The remaining sub-functions are identical to 


sub-functions used in evalquote. 


aml 





Evalquote2 has been implemented through an interpreter 
program written in Algol-W. This section documents the 
interpreter which has been compiled to run on a S/360. Also 
discussed are the sample LISP programs which were run under 


the interpreter and their results. 


Ree tHE ALGOL=W INTERPRETER 


Appendix E contains a source isting fOr tne 
Interpreter. Functionally, the interpreter is nearly 
identical to evalguote2. That is, it produces the result of 
a LISP function applied to itS arguments aiong with the 
associated g-vector. The following paragraphs summarize the 


organization of the Interpreter. 


For input, the Interpreter acceptS programs writte2n 
in the metalanguage syntax of Appendix A. Input is expected 
from 80-character records and can be written free-forn 


(column independent). 


If the first character of a record is a '$'t, the 
second character represents a toggle and causes a logical 
variable to be reset. The remainder of the record is 
ignored and may be used to comment on the reason for the 


megqgie. iff the toggle is a ‘'$', it causes the toggle 
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records to be listed until the next '$$' record is 


encountered. '$L' resets the LISTING variable which is 
maened On initially. ‘'ST' resets the TRANS variable which 
Meme turned off initially. When oon, TRANS causes the 


S-expression translations of the input function and argument 
iest tO be printed on the output device. 'SA" causes only 
the arithmetic operators (ADD and MUL) to be included in the 


computation of the g-vector. 


Tf the first character of a record is a '*', the 
entire record is considered a comment. Single quotes are 


Med to delimit in-line comments. 


rae Translation 


The SCANNER routine reads tokens (identifiers, 
constants, numbers, and specials) from the input stream. 
The translation routines change the program into two 
S-expressions (one for the function and one for the argument 
list) and store them in memory in the form of linked lists. 
The translation routines function in accordance with the 
translation rules of Appendix C. These translation rules 
were derived from the rules for translating M-expressions to 


S-expressions presented in Ref. 10. 


3. Interpretation 


Because Algol-wW Supports Eecurs 1.on, the 
interpretaticn routines are nearly identical to evalgquots2 
Which was explained in the previous section. The evalgquote2 
procedure within the Interpreter may be viewed as a 
Microprogqram in a hypothetical LISP machine. The machine's 
memory already contains an S-expression for a function and 


an S-expression for an argument list. The machine generates 
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a resultant S-expression containing the program result and 


mre G=-VeEctor. 


oe) OUTPUT 


The output PEOQme che Interpreter includes the 
resultant S-expression containing the program result and 
g-vector and also a summary of the information contained in 
the g-vector. The summary includes the number of processing 
elements required for a parallel execution, the number of 
execution steps required for both a sequential and a 
parallel execution, and the speed-up ratio. Additional 
output from the Interpreter includes diagnostic arror 


messages for the more common syntactic and Semantic errors. 


B. SAMPLE PROGRAMS 


Appendix D contains the sample LISP programs which were 
run under the interpreter. Program 14 is the PAIRLIS 
function with the same arguments used to generate the data 
ELOw graph of Figure 4. Program 11s included to illustrate 
the output from the Interpreter. The output includes the 
program listing followed by the S-expression translations of 
the function and argument list (enabled by the $I toggle). 
The CDR of the resultant S-expression can be compared with 


the data flow graph of Figure 4. 


Programs 2 through 5 represent matrix multiplication 
Seeeetwo 4x 4% matrices in which all the elements are I's. 


Hence, the resulting product matrix is a4x4of all 4's. 
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The S-expression representation for the first factor is in 
row major order while the second factor is in column mavjor 


order. 


Programs 2 and 3 use the same algorithm to compute 


the matrix product. Program 3 uses the $A toggle to include 


Only arithmetic operations in the computation of the 
g-vector. The MATMUL function computes the rows of the 
Peeaduct Matrix by calling the row function. The row 


function computes the elements of each row by calling the 
Serer unction. fhe dot EUnction computes the dot product of 
each row of the first factor with each column of the second 
factor. The dot function is defined so as to sequentially 
add the integer products of vector elements. As discussed 
in Section I, this is not the optimum way to define an 


associative process for a parallel processor. 


Programs 4 and 5 use the sum function presented in 
Section I to optimize the summation required for each dot 
Meoaguect. The g-vector computed for program 5 considers 
arithmetic operations only. Note that the results for 
program 5 correspond to the theoretic results discussed in 


Section I. That is, the number of required sequential steps 
a 2 _— ° 
1s 2n -n , or 112 for n=4. The number of parallel steos is 


1 + [log nl See EOL n—4. Because program 3 performs 


additions sequentially, it requires one aore parallel step 


mman program 5. 


The speed-up ratios computed for programs 4 and 5 
might be considered as lower and upper bounds, respectively, 
for an actual speed-up ratio (one that compares an actual 
parallel machine With an actual sequential machine). 


Program 5 ignores the data accesses represented bv CAR, CDR, 
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and CONS operations and also execution controls represented 
by the EQ operation. Obviously, the data must be moved into 
position in order to be operated upon, even if the movement 
of data takes place in parallel. Program 4 computes’ the 
speed-up ratio by giving the same weight to CAR, CDR, CONS, 
and EQ as it gives to ADD and MUL. This is not necessarily 
a correct assumption either, since data can normally be 
accessed from a high-speed memory faster than the arithmetic 


Operations can be performed. 


Programs 6 and 7 represent the symbolic 


4 
@ueserentiation of a fourth degree binomial, (x + y) , with 


respect OO ee The function consists of two primary 


sub-functions. DIFF computes the derivative by the rules 
for differentiating algebraic expressions. SIMP simplifies 


mmenresuit by eliminating factors of "i" and addends of "0." 


For program 6 the S-expression representation of 
4 
fot y) is 


(((x + y) *® (x + y)) * ((x + y) * (x + y))). 
In this expression the data is arranged symmetrically. ene 
program 7 the data is arranged asymmetrically and looks like 
((x + y) * ((x ty) * ((x + y) * (x + ¥)))). 
As expected the speed-up ratio is greater for program 6 
(5.2) than for program 7 (4.1). This comparison was made to 
provide an example in which symmetrically organized data 
Caused a greater speed-up ratio than the same data organized 


asymmetrically. 
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Program 8 is the universal function evalquote. The 
arguments for evalquote are the S-expression translations 
from program 1. The speed-up ratio for program 8 is 1.86. 
This is the smallest speed-up ratio of all the sampie 
programs. Additional runs were made with this program in 
which the PAIRLIS function paired lists of three elements 
each and lists of four elements each. Each run produced 


essentially the same speed-up ratio (1.86). 


For the matrix multiplication examples, the data is 
two-dimensional. As the size of the matrix factors is 
increased, the resulting data flow graph widens at a greater 


rate than it lengthens. In fact it widens at a rate 
proportional ee) n (the number of Simultaneous 
multiplications) while it lengthens at a@ rate proportional 


to Ol Program 8, on the other hand, is operating on 


one-dimensional data. Increasing the size of the data 


elements for PAIRLIS causes the total number of elements in 
the data flow graph to increase at the same rate as the 
length increases. Hence, the speed-up ratio remains 


essentially constant. 
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Veet ARDWARE CONSIDERATIONS 


The problem addressed in this section is how to design a 
hardware system which will implement the parallel execution 
of algorithms which are defined by a software system Such as 
Noure™ LISP. A detailed hardware design will not be given. 
Rather, a "skeletal" design for the hardware will be 
presented at a functional level in order to bring out some 


of the considerations involved in any design. 


Before proceeding with the example design, some 
Clarification of terminology wili be given. A parailel 
processing system refers to both the software and hardware 
portions of a complete systen. A parallel ovrocessing 
machine (Or computer) refers to the hardware alone, 
including at least memory and processors. A parallel 
processing system might include one or nore rarallel 
processing machines. The processor module refers to the 
module within a machine which contains the processing 
elements. A processing element refers to a single 


processor. 


A parallel processing machine must perform a function 
Similar to evalquote2. That is, it must evaluate a LISP 
function apflied to its arguments and in the process 
recognize parallelism. The data for this machine, both 
functions and operands, is in the form of ordered sets 
(parenthetical expressions). This data can be stored in a 


sequentially organized memory in the form of linked lists. 


Be PROCESSOR MODULE 
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Ideally, the processor module would be constructed on a 
mingle chip. Figure 5 is a modular diagram of a processor 
module. Each processing element in the module represents a 
hardware implementation of evalquote. There are three 
inputs to a processing element. The first input is a memory 
address for a program or a form as defined by the 
metalanguage syntax. The second input 1s a device address 
for returning the result to be computed. The third input is 
the memory address of the applicable association list. The 
outputs from a processing element are the result of the 
function applied to its arguments and the address of the 


device for which this result is destined. 


The processor Manager controis data transfers 
between the processing elements. The processor manager also 
keeps track of the status (busy or free) of each processing 
Sement. Initially, a LISP program (a list containing a 
function and constant arguments) is made available to a 
processor module from an external agent such aS a terminal 
user or another parallel processing machine. After the 
program is read into memory, the processor manager assigns 
the program to a processing element along with a return 
address to an external device (terminal, printer, external 
storage, or another parallel processing machine). As the 
processing element recognizes processes which can be 
computed in parallel these processes are made available to 
the processor manager for assignment to other available 
processing elements. The return address for thes2 processes 
Will be the processing element that originated then. When 
all processing elements are busy the processor manager will 


queue waiting processes. 
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Figure 5. Processor Module 
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The memory manager controls the common memory. 
Between programs, the memory manager converts memory intoa 
Single list of free storage cells. This list is made 
available for reading a new program into memory. Once 
execution has begun, the memory manager provides free 
storage célls to each processing element for use in 
performing the CONS primitive. Garbage collection is 
performed by monitoring the association-list stack in each 
processing element and returning links from outdated lists 


to the free storage list. 


3. Timing 


By constructing the processor module on a Single 
eaep lt can be controlled by a single timer. Two basic 
clock cycles, a compute cycle anda transfer cycl2, are 


required. 


The compute cycle enables all the processing 
SeeMentS tO perform a computation if they have one to 
perform. Also during this cycle, the processor manager's 
list of available processors is updated via the status flags 
on each of the processing elements. During the conpute 
cycle the memory manager can perform garbage collection of 


issue free storage cells. 


During the transfer cycle the processor manager 
performs a linear sweep cf the processSing element output 
Pemts. AS an Output port with data to be transferred is 
Swept, the destination (indicated by the return address) is 


enabled and the data transferred. One entire sweep is 
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performed in a single transfer cycle. Sinultaneously, 
during the transfer cycle, the memory manager performs a 
linear sweep of the memory ports for each processing 
element. All memory read and write operations are performed 


during a single transfer cycle. 


Dees PROCESSING ELEMENT 


Figure 6 is a modular diagram of a single processing 
element. The purpose of a processing element 1S to accept, 
as input, a form, as defined by the metalanguage systax, and 
to produce, as output, the evaluated result of the forn. 
The actions of the processing element are controlled by the 
decode-and-control module (DCM). The DCM contains the 
[eenroprograms f£or dll the primitive functions (CAR, CDR, 
CONS, ADD, etc.). The DCM alSo manages rour pushdown stacks 
which are required for evaluating complex forms. Thee Den 
also controls the processing element status register and the 
[yeero the processor transfer bus and the memory bus. 
Figure 7 is a functional flow chart describing the tasks 


performed by the DCM. 


The inputs t0oO a processing element are provided by the 
precessor manager or by the processing element'S own DCM. 
The first input is the address of a form and goes into the 
program register. The second input is the address of the 
@eseeiation list for the form. If the form is an original 
program (function and constant arguments) from an external 
device, then the association list will be empty. The 
incoming return address register receives the address of the 


@evrce to which the result will be sent. 


eee DECODE AND CONTROL MODULE 
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The DCM is designed to implement the overall pclicy of 
mee PalLalLlel processing system. That policy is to perforn 
sequential processes in sequence and to enabl= parallel 
processes to be performed in parallel. Hence, if the form 
that is input to the program register is a function and a 
set of unevaluated arguments, then the DCM will means the 
manGtion On the function pushdown, input one of the 
arguments to its own processing element, and send the 
remaining arguments to the processor manager for parallel 
evaluation by other available processing elements. When all 
the arguments have been evaluated and returned to the forms 
Piendown, the DCM will pop the function into the current 
mumecton register. If the function 1S a primitive, it is 
decoded by the DCM and applied to the arguments. If the 
function is a lambda expression, the DCM createS a new 
association list in memory by pairing the variables of the 
lambda expression with the evaluated argumentS on the 
pushdown. CMs eOutne PEOgram pegqister the 
remainder of the lambda expression which is a form defining 


mime Lrunction. 


Tf the form is a conditional, the reserved word COND is 
stored on the function pushdown. The list address for the 
second and succesSive predicate-expression pairs, the list 
address for the first expression, and the list address for 


the first predicate are stored in that order on the forms 


pushdown. Those three forms represent the predicate, the 
conclusion, and the alternative, respectively, Lor a 
generalized conditional expression. it is input to the 


processing e€lement for evaluation. If the predicate 1S not 
a constant, If the predicate evaluates to true, the 
concluSion, which is next on the forms pushdown, Ss 
evaluated, the alternative is discarded, and the COND is 
popped off the function stack. If the predicate evaluates 
to false, the COND is left on the function stack, the 
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conclusion is discarded, and the alternative conditional 
expression is input to the processing element for 


evaluation. 
When the form input ¢0 a processing element isa 


Variable, the DCM uses the accompanying association list to 


search for the corresponding constant value. 
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eno DL PONA mee AnRAL LED PROCESSING -CONSIDERATIONS 


eS a So ee SS ae a a ee =e ee ee ee eS ee eee Se ee oe ee 


There remain several areas within the realm of parallel 
processing One recursive functions which need further 
research. Three of these areas will be discussed in this 


Section. 


A. PARALLEL FPROCESSING OF CONDITIONAL FORMS 


Parallel processing of a conditional form means that 
Svaluations of the predicate, conclusion, and alternative 
are begun in parallel. If any of these forms are themselves 
Gonmditionals, or contain conditionals, then they too are 
processed in parallel. When the predicate evaluates to true 
(or false), all processing generated by the alternative (or 
conclusion) is halted, and any storage allocated for the 


evaluation of the alternative (or conclusion) is reclaimed. 


One of the problems with parallel processing of 
conditional forms is that it is wasteful of memory. It 
Might happen that parallel processing of a conditional form 
would exhaust memory before completion, whereas normal 
processing could complete within available memory. Preblems 
associated with limited memory sizes, however, can be 
expected to lessen as advances in memory technologies 


continue to push cost down and volume up. 
Another problem with parallel processing of conditional 


forms concerns undefined forms. feeeeondiciona! form is 


considered defined if: 1) the predicate is defined; and 2) 
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the conclusion is defined if the predicate is true, or the 
alternative is defined if the predicate is false. Hence, a 
well-defined conditional form may have either an undefined 
conclusion or an undefined alternative, but not both. A 
system which processes conditional forms in parallel must be 
prepared to deal with undefined forms. Some undefined forms 
are recognizable while others are not. For example, if X is 
an atom, then CAR<X> can be recognized as undefined. 
Analysis of the halting problem has shown that some 
undefined forms (e.g., some which recur infinitely) may not 


be recognizable. 


Assuming that a parallel processing system has 
sufficient resources (memory and processors), it may still 
be possible to process conditional forms in parallel. FOr 
example, assume the alternative is undefined. AS soon as 
the predicate evaluates to true, processing could be stopped 
on the alternative. thas may be a difficult and 
time-consuming task, however, Since the alternative process 


may have tied up an intricate net of processing elements. 


Because there are so many unanswered questions 
concerning parallel processing Of TCONGLELOnNal)  touns, 
evalguote2 was designed to graph conditionals in the 


traditional way. 


Peeeelcit MEANING.OF SPEED-UP RATIOS 


In the dual form of the data flow graph (the one where 
the nodes represent data elements and the edges represent 
primitive operations), the speed-up ratio can be defined as 
the number of nodes with indegree greater than zero divided 
by the maximum path length. This definition presupposes an 


unweighted graph which is probably not true for any actual 
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implementation. For example, the procedures for ADD and MAUL 
in the Interpreter are much more complex than the procedures 
fom CAR and CDR. This was the primary reason for the 
Interpreter's $A toggle which causes only the arithmetic 
primitives to be graphed. ft fs Net “ineoncei vaple to 
imagine a parallel machine which has a "smart," associative 
memory that performs the CAR and CDR EUineeaLonms 
automatically, thus eliminating them from the data flow 


graph altogether. 


In a machine where each instruction (primitive) reguires 
several timer states to complete, the data flow through the 
primitive processes can be described by a weighted graph. 
The weignt assigned to each primitive represents the number 
of timer states required for that primitive. There is still 
the problem of coordinating data transfers between 
processing elements which implies a need Oe 
Synchronization. Each stage in the parallel execution could 
be timed to allow for the longest possible instruction. 
This would be analogous to the unweighted graph. Orme each 
stage could be timed to the longest instruction in the 
stage. This could be implemented by each processing eiement 
setting a ready line at instruction completion. The 
processor manager would begin data transfers when all the 
ready lines were set. A third alternative is to let each 
processing element execute sequentially until there is data 
to be transferred (a completed rusult or arguments to be 
computed in parallel), and then to set a transfer ready 
line. The processor manager would continuously monitor the 
transfer ready lines. When a line goes true, the processor 
Manager would interrupt the destination and enable the 


fransrer. 


eee LHe WIDTH COMPUTATION 
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The width at each stage of a data flow graph has been 
defined rather loosely as the number of primitive processes 
to be executed at each stage. More precisely, the width at 
stage i has been computed as the number of distinct 
primitives representing step i in separate paths. The 
Bagnaficance of the width of a data flow graph is that it 
represents the number of processors required at each stage 
for a program execution represented by that data flow graph. 
Moeewnat if the required number of processors for a given 
stage were not available? Is there a way for some of the 
processes to be delayed to future stages when sufficient 
processors are available and yet not increase the number of 


parallel steps reguired for the entire execution? 


Consider once more the data flow graph of Figure 4. Hig 
only three processors w2re available at stage two, it might 
be possible to delay the first CAR operation, and hence the 
mepst CONS operation, and still complete the execution in 
seven parallel steps. Delaying the first CONS operation by 
one stage would cause the width of the graph at stage 4 to 
increase from 4 to 5. This increase could be avoided by 


further delaying this CONS operation one or two more stages. 


If the first CDR operation at stage 2 were delayed, it 
would obviously cause an increase in the number of parallel 
stages reguired for completion. How can the proper process 
to be delayed be recognized? This would apparently require 
peme "1OO0kK-ahead" capability not included in the parailel 
“yetem OL the previous section. Without "looking ahead," it 
may be possible to adopt a strategy which causes the "right" 
processes to be delayed most of the time. For example, of 
the four processes at stage 2 in Figure 4, two produce 
arguments for a CONS and two produce arguments for the next 
mavecation of the PAIRLIS function. The two CAR's) and 


subsequent CONS represent a known number of required stages 
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(2), whereas the PAIRLIS function is recursive and the 


number of stages required will depend on the data. 


Hence, one strategy might be to ieag. recursive 
functions, and when insufficient processors are available, 
to delay non-recursive functions before delaying recursive 


munctions. 


Another strategy that might be used to execute the 
program in a minimum number of stages with a limited number 
of processors requires a2 modification to the method of 
evaluation used by evalguote2 (and also evalquote). These 
universal functions evaluate another function and its 
arguments by first evaluating all the arguments and then 
applying the function. There are cases where some work can 
be done in applying the function before all of the arguments 
are evaluated. The PAIRLIS function of Figur2 4 again 
serves as an example. From the data flow graph it can be 
seen that the second CDR of stage 2 (as well as the second 
CDR of stage 4) could be delayed one stage without affecting 
the total number of stages required. Todo this would wmnean 
commencing the second (and third) invocations of PAIRLIS 
before all the arguments were evaluated. In other words, as 
soon as the first argument was evaluated, it would enable 
the first predicate, which only requires the first argument, 


to be executed. 


The goal of the two strategies mentioned so far is to 
allow a reduced set of processors to still perform the 
program execution in a minimum number of stages. These two 
Strategies as well as the goal they are seeking represent an 
area for further research in the design of a parallel 


processing system. 
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tit eeoumoiARY AND CONCLUSIONS 


— ee ae eae — — —_——_ ee ee 


It seems appropriate, in summary, to abstract from the 
preceding pages some organizational principles for the 
Gonstruction of a parallel processing system. These 
principles are inspired in part by the principles for 
recursive machines presented ips) Ref. 2. The three 
principles presented here represent the essential qualities 
of a system designed for the parallel processing of 


recursive functions. These principles are as follows. 


1. Programming language operators (functions) are defined 


recursively in terms of machine-level operators. 


2. Parallei tasks are distributed among available 
processors SO that, at any time during program 
execution, the internal machine structure, 1.e. the 
relationships between processors, represents the 


structure of the executing progran. 
\ 


3. Processors share a common main memory in which the data 


1s stored associatively. 


fic —©£lrSt Principle 31s quite similar to the first 


principle for recursive computer organization discussed in 


Misie, 2. By defining operators of the programming language 
as recursive functions composed of previously defined 
functions which are ultimately defined in terms of 


mMachine-level functions, there are no limits to the language 


levels possible. And yet, no matter how complex the 
language operators become, tne programmer is Sta 
programming essentially aye machine language, thus 


eliminating the need for intermediate compilation. 


54 





With such a language structure, a user could define a 
set of functions which would represent a special purpose 
programming language for his particular problem area, rather 
than having to adapt to a general purpose "high-level" 


language. 


The second principle implies the need for some complex 
intercommunication scheme between processors working on the 
Same problen. For example, should each processor be 
connected to ali other processors, or should processors be 
arranged in some ideal network that provides "sufficient" 
intercommunication? The example parallel processing systen 
proposed in the previous section suggests a single transfer 
bus controlled by a processor manager. This "conveyor" 
method has been included in earlier proposals for parallel 


systems [ 12]. 


The second principle also implies the concept of 
Space-sharing as opposed to time-sharing. A user progran 
from a peripheral terminal would be allocated available 
processors until completion rather than being paged in to a 
Single processor for a time-slice. As the time-slice 
prevents a single user from monopolizing a time-sharing 
System, Similar controls could be provided in a parailel 
system by limiting the number of processing elements or 


processing modules availaple to a single user program. 


The third principle suggests sharing a main memory among 
the processing elements. This would eliminate the excessive 
Paes tlLansmissions that would occur if each processor had 
its Own memory. Storing the data associatively implies any 
scheme in which the data is arranged to facilitate? accessing 
successive data elements. This principle has been 
implemented in the past (and in the Interpreter) py building 


linked-list data structures in linearly-organized memories. 
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For a memory which stores data in the form of 
S-expressions, the CAR and CDR functions applied to a data 
element represent accesS operations to the successor 
elements of that data element. These two functions could be 
performed automatically by a "smart" memory, and the 
successor elements always made available to a processing 


element if and when they are needed. 


The concepts of parallel computation are not new. The 
literature is rich with proposals for parallel machines, 
Summaries of such proposals, methods OTs recognizing 
parallelism, and other related subjects. Parallel machines 
of the past have been costly to construct and have reguired 
complex software support. Meanwhile, sequential machines 
have continued to achieve faster execution speeds. But now, 
as the increases in execution speeds bedin to level-off, and 
LSI technology brings the cost of parallel systems within 
reason, the stage is set for a new and different generation 
of computing machines. A proposal has been given for a 
parallel processing system based on defining algorithms as 
recursive functions. Evalguote2 has demonstrated that the 
structure of algorithms defined as recursive functions maxes 
possible the distinction between parallel and sequential 
tasks. The ideas presented in this thesis represent an 
attempt +o show that a parallel system based on sinple, 
highly-structured software can realize the speed-up of 
parallel computation while avoiding the burden of complex 


software. 
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APPENDIX A 


SYNTAX 


The non-terminal symbols are in lower case letters. The 
meaminal symbols include <, >, ;, &, @, -~, «, ", (, ), upper 
case letters, and decimal digits. '::=' means "is defined 
as. ch sneans. “Nor ." '...' means any number of the 


specified element. 


program Po elem Olam S Kes 2} S—- eC X D> 

janet 10n ¢<:= identifier | 
S<<Vattaple*o.. -Vabrlable>; form> | 
Frdentit lors functlon> 

form -2= Constante | Variable | 
PUNGerGncadEGuUments...+argument> | 
Ga@QeuiictPOmte: . ... On) £Orn> 

argument - LOLiy | tune «on 

variable 2f= aentifiecr 

constant :3= "atom" | (S-exp.s-exp) | 
(s-exp...s-exp) 

Ss —=xp = acon pi (S-CKkKD.S-exp) | 
(s-exp...S~-exXxp) 

atom ¢::= identifier | number 

maemcafier <::= letter | identifier ietter | 


MCL ELer Gilgit 


Sye 





number gee limp ore ang 1t 


letter 22] A |) GOD) oe oe 
oa {LL { Mf N 

S| met vo] ¥ { 

digit mom ietes | 3st & | 
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Wena rWHe 


eee eee Xo 


EV ALQUOTEZ2 


» EVALQUOTE2 


&<<Ftl; ARGS); 


"SUB-FUNCTION NAMES ' 

&<<APPLY2; EVAL2; FEVCON2; GRAPHCON; EVLIS2; 
COMPOSE; COMBINE; APPEND; SUM; 
PAIRLIS; ASSOC; NULL; 
CAAR; CADR; CDAR; CADDR; CADARD; 


‘DEFINITION OF EVALQUOTE' 
APRLY2 Pile @GOUS<ARGS= ()>;- "“HIL>> 


"SUB-FUNCTION DEFINITIONS 


C"APPLY2' &<<FN3 X3 AD; 
<ATOM<FHD ~ 
<EQ<FNs "CAR'D 7 
CONS <CAARKCAR<X>>; APPEND<CDR<X>; (1)5)>; 
FO<FN; "COR">D > 
CONS <CDAR<CAR<X>>3 APPEND<CDR<X>3 (1); 
EQ<Fh;: "CONS"> > 
CONS<CONS<CAAR<X>3 CADAR<X)); 
APPEND<CDR<X>;3 (1); 
EQ<FN; “ATOM"> * 
CONS<ATOM<KCAARCX>>; APPEND<CDR<X>; (1); 
EQ<FN; "EQ"> 7 
CONS<EQ<CAAR<X>; CADAR<X>); 
APPEND<XCDR<X>;3 (1); 
Mr = APPLY2<CAR<EVAL2<FNZA>>; X; ADD; 
EQ<CAR<FIND; "LAMBDA"> 7 
COMPOSE<EVAL2<CADDR<FHD; 
PAIRILIS<CADR<FIND; CAR<KX>3; Add; CDRK<KXDD; 
FQ<CAR<FND>;: "LABEL"D ~ 
APPLY2<CADDR<FIND; Xs; 
CONS<CONS<CADR<ENDs CADDR<FNDD3 AddDdd; 


PEVAU2 (n<<E;> AD; 
<ATOMSE> ~ CONS<CDR<ASSOC<XE; Add; "NIL™>; 
ATOM<CAR<E>> * 

MEQ COARCE cee UOTE:> = GOHS<CANR<E>s “HILL >; 
FQ<CAR<KE>; "COND'> ~ EVCON2<CDRK<KE>D; AD; 
immenb Pine COAL Cbs  EViIS2<CDR<E>: Ads ADd>; 

i Aree ye <CARKE>; EVIIS2<CDRKE>; A>; A>>>?: 


"EVCON2' &<<C3 A>; GRAPHCON<EVAL2<CAAR<C>; Ad; C3 ADD: 


"GRAPHCON' &<<P; C3 AD; 
<CAR<P> ~ COMPOSE<EVAL2<CADAR<C>; Ad; CDR<PDD; 
"Ty" ~ COMPOSEXEVCON2<CDR<C>; Adz COR<P>>>>; 


PEVIELS 2 Sk<<t; A>; 
CHUL HIE: 
"TM" COMBINECEVAL2<CAR<I.>; A>; EVLIS2<CDR<KL>; A>>>>; 


"COMPOSE' &<<X3 YD; CONS<CAR<X>; APPEND<Y; CDR<X>>>>; 
"COMBINIE' &<<U; V>; 
<HULL<V> ~ CONS<CONS<CAR<UD; "NIL™>; CDR<UD>; 


=i = CONS <COHS<CAR<U>; CAR<V>>; 
SUM <CBR<UD3 CDR<Vd>>)); 
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63 "APPEND' &<<X3 Ys <NULL<X> 7 Y; 

64 MT" ™ CONS<CAR<X>; APPEND<CIDR<X>3 Y>>>>: 
65 

66 "SUAI’ (8 <Xe YD> 

67 <NULL<X> 7 Ys MULL<YD 7 Xz 

68 "T'S CONS<ADN<CAR<X>: CAR<YDD; 

69 SUM<COR<X>: CDPR<Y>>>>>: 

70 

71 "PAIRLIS' &<<X: Ys AD? 

72 <HULL<X> 7 As "TT" ™ CONS<KCONS<CAR<X>s CAR<YDD: 
73 PAIRLIS<CDR<X>3 CDR<Y>; A>D>>D: 

74 

75 "ASSOC &<<X3 AD3 

76 <EQ<CAAR<KA>; X> 7 CARSAD: "T" 7 ASSOC<KXs CDR<AD>D>D: 
ne 

78 PHU eeCcl > EOL IE E> >: 

79 

80 "CAAR' &<<L>3: CAR<CAR<LD>D; 

81 

82 "CADR' &<<L>3 CAR<COR<ED>D: 

83 

Bu "CDAR' &<<L>; CDR<CAR<KL>>D; 

85 

86 "CADOR' &<<L>; CADR<CDR<KL>>D: 

87 

88 "CADAR' &<<L>: CAR<CDAR<L>>>>> 

89 

90 

91 * SAMPLE ARGUMENTS FOR EVALQUOTE 

92 

93 <(LABEL PAIRLIS (LAMBDA (X Y A) 

94 (corp (CEQ X CQUOTE trtt)) A) 

95 (CQUOTE T) CCONS (COHS (CAR X) (CAR Y)) 

96 CPATRLIS (COR X) (COR Y) A))))))3 
97 CONE yuck 2). C663) ))> 

98 ## 


wexeaxee EVALUATION BEGIIS *«**e%* 


RESULT 1S: 
means)? (8.2) (C.3)) 1 & 246 21 «1) 


FREE STORAGE REMATHEMNGS: 2768 
039.08 SECONDS It EXECUTIO? 
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eee Di 3 


TRANSLATION RULES 


Rules for translating programs: 


imeeeeA £Unction is translated by the rules for translating 


EUMCE IONS « 


Pee S-@°SD3...;S-]2xp> translates to (s-exp...s-exp). 


Rules for translating functions: 


emer. <i 52.6.5 KN>°% LOLrm> translates to 
{LAMBDA (X1...XN) form*) where form* is the translation 


Sera L£orm. 


ieee O<FN; function> translates to (LABEL FN function*) 


Hie ComrineeLOn= 1S the translation of a function. 


See Ll a £unction 1S an argument, then it translates to 


fOuOre Lunction*) < 


Rules for translating forns: 
eee ''X'' translates to (QUOTE X). 


7. «rf the form is a parenthesized s-expression, then it 


translates to (QUOTE (s-exp)). 


Be Lunction<argument;...3;argument> translates to 
(function* argument*...argument*), where argument* is 
the translation of an argument which can be a form or a 


minet. on. 


Pee LOnM = LOLMs...%3form = form> translates to 





CeCe eOrms shorn)... {form form™)). 


62 






« ((*a2z02 Peres) cs 4 


APPENDIX D 


SAMPLE PROGRAMS 


$$ 
$T 
1 W<PAIRLIS; &C<XZYZA; 
Z CEN<XS"NIL™D 7 A 
3 "TM ™ CONS<CONS<CAR<X> ZS CAR<Y>D; 
4 PAIRLIS<CDR<X>3CDR<YD3 ADD 
5 
6 CAGE ) 3 12) 606.3) ) > 
7 # 


weeaexe TRANSLATION FOLLOWS s*2e%® 

( LABEL PAIRLIS CLAMBDA (X Y A) CCOND (CEQ X COQUOTE NIL)) A) (CQUOTE T) ¢ 
Cows (CONS CCAR X) (CAR ¥)) CPAIRLIS (CDR X) (CDR Y) A)))))) 

mens) <1 2) ({(C.3))) 


waeaeeee EVALUATION BEGINS seeaxe 


RESULT US: 
meena. 1) (8.2) (C.3)) 1 & 245 21 21) 


PROCESSORS REQUIRED FOR OPTIMUM PARALLELING. 4 
Pee ON STEPS (PARALLEL) 2... 0cccss ceases Y 
PeeeuTrONe STEPS (SEQUENTIAL). 0.5 ..c0cce cosas 15 
BeeeU-UP RATIO CSEQUENTIAL/ PARALLEL)........ 2.142857 
FREE STORAGE RENAINING: 15925 


002.96 SECONDS IN EXECUTION 
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1 & THIS FUNCTION PERFORMS MATRIX MULTIPLICATION. 
2 * ~ POT PRONUCTS ARE COMPUTED BY PERFORMING SEQUENTIAL 
3 * ADDITIONS OF INTEGER PRODUCTS. 

ky 

5 eon: Boe 

6 &<<MATMUL; ROW; DOT; NULLD; 

7 

8 MATMUL<A;B>> 

g 

10 <"MATMUL' &<¢<X3Y>3s <NULL<XD 7 "NEL" 
11 MT = CONS CROWKCAR<X>3 Ys MATMUL<CDR<XD3s YD>D>D; 
12 

13 POW wack: Co? <HULL<C> ~ HTL": 
14 "T" = COMNS<DAT<CR: CAR<O>>: ROWCKR: CDR<O>>>>D; 

15 
16 'DOT' &<<U;3V>3 <NULL<U> 7 0; 

17 UTM | ADDCMULCCARCU>; CAR<V>>; DOTKCDRKUD; CDPRK<V>>>>>; 
Lf 
19 itt’ “Sen >s EO<t: “SI L">>>> 
20 
oN *SAMPLE INPUTS 

ay GGG yecied 2 det) 1 1°91) (1 1 1 «1)); 

23 Cerio eG l iedy €i 1 1 1) (1 1:1 :1)+))> 

24 E 


zxaeee EVALUATION BEGINS **2%%* 


RESULT 1S: 
Gece Gy h RY) (bh & & Rk) Ch ku) (kh & & &)) 1226 *& 10 8 20 18 32 20 4&6 


meee ome oe ton2o 51°19 19 15 10355 35322111171~21) 


PROCESSORS REQUIRED FOR OPTIMUM PARALLELITPNG. ee, 
mamcCUTION STEPS CPARALLEL) . 05.0200 cecssceses 37 
Pee One aT ERS CGEQUENTIAL) 66.60 se ess wees 549 
Seeu-UP RATIO (SEQUENTIAL/PARALLEL)........ 148 ,83784 
FREE STORAGE REMAINING: 1 Naa 


019.56 SECONDS IN EXECUTION 
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$$ 
$A GRAPHING ARITIIMETIC OPERATIONS ORLY 
1 * THIS FUNCTION PERFORMS MATRIX MULTIPLICATION. 
2 t DOT PRODUCTS AYE COMPUTED BY PERFORMING SEQUENTIAL 
3 * ADDITIONS OF INTEGER PRODUCTS. 
4 
$L SUPPRESS FUNCTION LISTING 
$L TURN ON LISTING FOR ARGUMENTS 
21 *SAMPLE INPUTS 
22 SOG) ellen Cl Pal i) (1 1 1 1)); 
25 Clea erie) Cl Lott) (1 1-1 «1))5 
24 # 


heeeaee EVALUATION BEGINS #8000 


RESULT 1[S: 
(((b 4 UG) (4h UGG) (46 G6 G&G) C4 4 & G&)) 64 16 16 16 16) 


PROCESSORS REQUIRED FOR OPTIMUM PARALLELING. 64 
Bee Oi omer St PARALLEL) «s4.24 6.0 +0040 5440's 5 
fMeecullON STEPS (SEQUENTIAL)... .......0-.0-0. 128 
Seeev-UP RATIO (SEQUENT IAL/ PARALLEL)....<.... ta 29 999 
FREE STORAGE REMAINING: 11569 ; 


016.35 SECONDS IN EXECUTION 





1 wy THIS FUNCTION PERFORNS MATRIX MULTIPLICATION. 

2 * DOT PRODUCTS ARE COMPUTED BY PERFORMING PARALLEL 
3 * ADNITIONS OF PAIRS OF INTEGER PRODUCTS. 

& 

5 &<K<KASBD: 

6 a&CCHATMUL;: ROW; DOT: SUM; REDUCE; VIUL; CADR: CDDR; NULL; 
7° 

8 NATMUL<A3B>> 

9 

10 < HATING << YO ECHUILE XD NIL" 

11 "Tt = CONS <ROW<CAR<X>: YO: MATMUL<CDR<X>: YDD>D: 

i 

13 "ROE Tic Gh C>- citi ce> = HIE 

14 "TH = CONS<DOTKR; CAR<C>>;: ROWKR: CDR<C>>>>>: 

15 

16 "NOT! &<K<KUsV>s SUM<VMULCUSV>>>: 

17 
18 "SUM &<<A>s <HULL<CODR<A>> * ADD<CAR<A>D: CADRK<ADD: 

19 "Tt = SUMCREDUCE<A>>>D; 
20 
21 "REDUCE &<<A>3 <NULL<A> 7 "HEL s MULL<CDR<A>> 7 A; 

22 MT" = CONS<ADD<CAR<A>D; CADR<A>>: REDUICE<XCDDR<A>>>>>D; 
23 

2h VUE &<CUsV>s > <HULI<V> ~ “nt: 

25 "TH — CONS<MUL<CAR<U>: CAR<V>>: VMUL<COR<UD; CDR<V>>>>>; 
26 

27 "CADR' &<<L>3 CARKCDR<L>DD; 

28 

29 'CDOR' &<<LD3 CDR<KCDRK<KL>>D; 

30 

31 "HULL ' &<<L>: EQ<Ls "NEL™>>>> 

32 

33 *SANPLE ENPUTS 

34 CCG wieieatomCie tl Ty ot) 2 19 1) (1 1 1 «1)2): 

35 Gelinas tle) 11) Cl tl yt) 11 21))> 

36 £ 


wsewere EVALUATION BEGINS weeeee 


Ss 
meus Uh) (tb & & &) (6 & YG) (6 Gb kh &)) 1 226 & 10 8 20 Ik 32 20 & 
aoe coe oo 29) 4G 26 56 22 25 19 21 18 21 18 26 21 28 25 28 27 26 29 2 
momeco 25 17 22 1b 18 10313 7 §$ 56353223331 i2i2-21 1?) 


PROCESSORS REQUIRED FOR OPTINUM PARALLELENG. ob 
eee VOU STEPS CPARALLEL) 2.25... sacs cease S78, 
Pree lOl STEPS CSEQUENTIAL)... wc. ccc csc eces 1045 
Sebo =UP RATIO CSEQUENTIAL/PARALLEL)........ W711 85 
FREE STORAGE REMATHENG: 6954 


038.90 SECONDS IN EXECUTION 
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$$ 
SA GRAPHEHG ARE THMETEC OPERATIONS OLY 
1 
2 * THIS FUNCTION PERFORMS HMATREX BULTIPLICATHIOR, 
3 * DOT PRODUCTS ARE COMPUTED BY PERFORM ENG PARALLEI 
4 * *  ADDETIONS OF PAERS OF FENTEGER PRODUCTS. 
> 
$t SUPPRESS FUNCTION LESTING 
$L LIST ARGUMENTS 
34 *SAHPLE UtlPUTS 
5° SG lett (2 11 1) (1 1 «21 1)); 
36 CC eter) 1 1) (1 1 1.1))> 
37 f ‘ 


aeeeee EVALUATION BEGINS teense 


RESUIET ts : 
(CCR & hw) Ch Ge &) Ch & & &) Ch & & &)) 64 32 16) 


PROCESSORS REQUEREN FOR OPTERNUM PARALLELEHG., 64 
Een OGUMenOb ote rae CPARALLEL) «<4 6 5-50 06008 e026 5 
See UMineimo des eC SEQUENTIAL) oes se <6 0 eee . it 
SPEENSUr RATIO CSEQUENTUAL/PARALIEI)......2 Be ee 
FREE STORAGE RENATNEHG: 7122 


031.96 SECONDS TN EXECUTION 


on, 


Wey OM Sw Ph e 


* DIFFERENTIATE A POLYNOMIAL WITH RESPECT TO 
* ONE OF ITS VARIABLES. SIMPLIFY THe RESULT. 


&<CPsX>3 "SIMPLIFY DERIVATIVE OF "PY WARS TL "xt! 
RCCCADR; CADDR; CONS3>; ‘DEFINE PRIMITIVES OM A-LIST! 
Q<SIMP; &<<P>; "SIMPLIFY FUNCTION! 


<ATOM<P> ~ P; 


re ~ 


E<CPL> OP; P22; 


<EQ<OP; 4M» ~ 
<EQ<P1; "O"> ™ P2; 
FO<P2- 0° > ~ Pls: 
ieee Oiloo <r ls +)"; P2> 
aie. 2 
SEQCPi 0. > = Var» 
EQ<P2; gtts ~ Ore: 
EQ<P1; "1"> 7 P2; 
C0GP2-a 1") Pls 
Vie -CONSS<Pls “a> p2> 
> 
> 
> 
CSINP<CAR<PD>3 CADR<P>; SIMP<CADDNR<P>>> 
> 
>> 


< ‘ARGUMENT FOR SIMP' 
Q<DIFF; &<<P; Xd;  'NERIVATIVE FUNCTION’ 


Cain <P> SCEQCP 3 X> Ie Tet NON); 
wy" ~ &<<Pl; OP; P2>; ‘MORE THAN OE TERM' 


<EQ<OP; “+"> = 
CONS3<DIFF<P1; X>; "+"; DIFF<P2s X>>; 

rT" ™ CONS3<CONS3<DIFE<P1l; X>3 “a's P2d; "+"; 
CONS3<DIFF<P2; x>; sx"; P1>>>> 


<CAR<KPD; CADR< PD; CADDR<P>>>>> 


<P; X> ‘ARGUMENTS FOR DIFF! 
> 'END OF SIMP ARGUMENT' 
> 


< 'PRIMITIVE DEFINITIONS' 

&C<X>3 CARKCDR<X>>D; 

&<<X>3 CADR<COR<X>>>; 

SORX Yl CONnS<X. CONS<Y;: CONS<Z; "Hit™>>>> 
> ‘END PRIMITIVE DEFINITIONS! 


> ‘Eno! 
® SAMPLE ARGUMENTS WITH SYMMETRICALLY ORGANIZED 
® FOURTH-DEGREE BINOMIAL. 


Site ee ret YO ee CCX + YY) « CX ¥)))s X> 
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weeeee EVALUATION BEGINS ****%% 


RESULT tS: 
rc CX * ¥) © (Xe 9) *e CCK + ¥) * CX + Y)?YD?D) + COCX + YY?) + (CX + Y+d?Y «& € 


mero © (Xk * Veto 2 deb 2 Gb 22612 8k eBeBEA HERA 2 2 
ee ee leet eo dee eee 18 18 25 25 20 20 1k 13 119 8 7 6 6G 


Domo Seto 5 2 eeecee alee eee 2 2°11 tt iii i) 


PROCESSORS REQUIRED FOR OPTIMUM PARALLELING., 23 
ee Oi STEPS CPARALLEL) fe450s 00s ecco 8] 
eC UNPOM  SIEPo SC OEQUENTIAL) 26 cc sc ees cae oes 422 
SBEED UP RATIO CoEQUENTVAL/PARALLEL)........ 5.209876 
FREE STORAGE REMATHING: 11739 


016.96 SECONDS Iti EXECUTION 
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$$ 
1 * DIFFERENTIATE A POLYNOMIAL WITH RESPECT TO 
2 * ONE OF ITS VARIABLES. SIMPLIFY THE RESULT. 
3 
SE SUPRESS FUNCTION LISTING 
$L TURN OH LISTING FOR ARGUMENTS 
56 * SAMPLE ARGUMENTS WITH ASYMMETRICALLY ORGANIZED 
57 * FOURTH-DEGREE BINOMIAL. 
58 
59 CCX eee Glee iy) * CCG Y) * (X + Y))))3 X> 
60 # 


waeeeee EVALUATION BEGINS ****%% 


Result iS: 
met ee CX oy een tay) CCC CX + YY) * (X + ¥)) + (CCX + ¥) + 


rr) ee eo) eee) eles ee 1 1 2642724853559 6 3 3 
cece eee et lettin it i Pl li>s3 44 7 7 8 8 12 12 
Pome e te gh 18 18 ly 16 93 15 11 11 ss 7 Sub u hh hehehe 556 5 5 53 2 2 


eee ceili TP obobiyi tl idididaiidi1i1ii1) 


PROCESSORS REQUIRED FOR OPTIMUM PARALLELING. 18 
EeecUrlON STEPS (PARALLEL)... 2c ccc cs cues eee PZ 
emen On STEPS (SEQUENTIAL). ...0..0.060008s 452 
Maeeo-UP RATIO (SEQUENTIAL/ PARALLEL)........ 4.035714 
FREE STORAGE REMAINING: 11332 


017.76 SECONDS IN EXECUTION 
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men awn & WA 


* EVAL QUOTE 
a<<Fh; ARGS); 


"SUB-FUNCTIOMN MAMES' 
&<<APPLY: EVAL; EVCON; EVLIS; PAIRLIS; ASSOC; 
NULL; CAAR; CADR; CDAR; CANDR; CANARD; 


"DEFINITION: OF EVALQUOTE' 
APPLY<FN: ARGS; "HIL™D>> 


"SUB-FUNCTION DEFINITIONS ' 


C<'APPLY' &<<FIs X3 ADs 

<ATOM<FHND 7 

<EQ<FMs "CAR™> 7 CAAR<XD; 

EQ<FN; "CDR"> 7 CNAR<X>; 

EQ<Fi; "CONS"> ~ CONS<CAR<X>3 CANDR<X>>; 

EQ<FN: "ATON'> ~ ATOM<CAR<X>); 

EQ<FN: "EQ"> 7 EQ<CAR<X>3 CADR<X>D; 

NT" APPLY<EVAL<FNSA>: Xz A>? 
EQ<CAR<FND>: "LAMBDA"> 7 

EVAL<CADOR<END; PAIRLIS<CADR<FIMDs Xz ADD; 
EQ<CAR<FND: "LAREL™> 7 

APPLY<CADOR<FHD: X; 

CONS<CONS<CADR<FND; CADDP<FHD>: A>>>>; 


"EVAL' &<<Es ADs 
<ATOMN<E> 7 COR<ASSOC<KE; ADD: 
ATOM<CAR<E>> 7 
KCEQ<CAR<SED>; "QUOTE"> 7~ CADR<ED; 
EQ<CAR<E>s "COND"> 7 EVCOIMKCDR<E>: AD; 
"T"'™ APPLY<XCAR<E>; EVLIS<CDR<E>: Ad; ADD? 
"rT" —~ APPLY<XCAR<E>;s EVLIS<COR<E>;s A>: A>>D>: 


*EVCON' &<<C3 ADs <NULL<C> 7 "UNDEFINED": 


EVAL<CAAR<C>; Ad ~ EVAL<CADARCKCl); 


ine VGOR<COR<C>? AD>>>; 


*EVLIS' &<<L3 AD: 
CHUEE<ES = “Tt! 


MT" CONS<EVAL<CAR<L>; A>; EVLIS<CDR<L>; A>>>>; 


UPAIRENS “G<<Xs Ys Ads 
<NULL<X> 7 As "T" 7 CONS<CONS<CAR<SXD: CAR<YDD? 
PAIRLIS<COR<X>3 CAR<Y>; AD>>D: 


"ASSOC' &<<X3 ADs 


A>; 


GEMCC AA Choe XD Te OCARCA>S "T= ASSOC<KX? COR<A>D>>>: 


MUI enee LO meOCia ns TILE >> s 
"CAAR' &C<L>3 CARKCARCKL>DS 
"CADR' &K<L>s CARKCOR<ED>>; 
BCH AlN ctr< <> sCDE <CARC >>>; 
"CADDR' &<<LD>Z CADRKCDR<L>>DS 


TCADAR' & <<>>: CAR<KCDARCKLD>>>> 


71 





64 * SAMPLE ARGUMENTS FOR EVALQUOTE 

65 

66 <(LABEL PAIRLIS (LAMBDA (X Y A) 

67 (COND (CEQ X (QUOTE HIL)) A) 

68 ; ((QUOTE T) (CONS (CONS (CAR X) (CAR Y)) 

69 (PAIRLIS (CDR X) (COR Y) A)))))); 
70 CCAR) Cl 2). (€6.3)))> 

71 # 


weaekee EVALUATION BEGINS eewewe 


RESULT {S: 
macy, 1) (3.2) (CC. 30ND er ss 2 lt 
fr) 1 1 PA tr E21 2 2 


6 6 be5s8 68 7 6655 5 5 
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Mem lil vil eyi 112i bid 
2435555 
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meek 1 1 Pl le a ea ey 1 1 
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PROCESSORS REQUIRED FOR OPTIMUM PARALLELING, 


meee Ol STEPS (PARALLEL)... 22.0 ace os . : 
EXECUTIOf STEPS (SEQUENTIAL), AOU Oe 
SPEED-UP RATIO (SEQUENTIAL/ PARALLEL) eereregade e- 
FREE STORAGE REMALNENG: 8349 


032.12 SECONDS IN EXECUTION 
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EVALQUOTE2 
SVALQUOTE2 IS AN ALSOL-W PROGRAM DESIGNED TO 
ERR SAG emA NOUN TeRe ast ENP UT PROGRAUS WRITTEN 
Tie tooeocbe-banGUaGce OF PURE LisP. DURING 
ENT shee lert EON, EVALQUOTE2 ANALYSES THE DATA FLOW 
THROUGH THE LISP PROGRAN AND GENERATES A VECTOR 
DESCRIBING THE DATA PLOW G2AaPH. 

ORGANIZATION 
EibpeoveGnan Gas S2oN LOGICALLY DIVIDED INTO SLEVEN 
BES Ll NS. 
I: GLOBAL DECLaAgaTIONS 


G1 33 Poo tl ives 

lita STORAGE SAA NAGSALSNT 

3 INPOT SUPPORT 
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) ae OUTPUT 
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he SECTION I: GLOBAL DECLARATIONS * 
* * 
RELL LAE KRAREARREAK AE EKA AKAM AKARKAKMARSKAKKLE KKK KLEE EK EKA © 


R FLH: COMMENT 
R L2NGTH: COMMENT 
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RL ena NO. COMMENT 
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COMM EN To 2 2 2 a RR RK Re 
* * 
a SEGTIOU LLL: ~ STORAGE SANAGSAENT ‘ 
* 

RKO SOHO OK KK KEK RA RK KEK 


INTEGER PROCEDURE ALLOCATE; 
CORMENTSALEOCGCATE ONE CEL 
N 


BEGTI} 
INTEGER X; 
X := CDR(PLR) ; 
TP XY = NIL THEN 
BEGIN 
IP INTRANS OR COLLECTED THEN 
ZRRO 
ELSE 
BEGIN 
PREE HASH TABLE; 
X := "CDR (PLA); 
END; 


sercba (PLR, GDR (X)) 5 
END ALLOGAZE; 
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* SECT IGN IV: INPUT SUPPORT * 
* * 
6 ee A KR E K KRRRE KK 


0 feed eee PROCEDURE SAU LLOeC ees GER VALUE A); 
aH T CREATE ATOM~HSADZR STRUCTURE FOR CONSTANT A; 
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PROCEDURE BUILD ATOM (STRING(72) VALUE WORD; 
INTEGZR VALUE A,B); 
COMMENT CREATS ATOM HEADER STRUCTURE FOR WORD OP 
LZNGTH A AT LOCATION 3; 
3EGIN 
INTEGER PROCZDURZ BUILD PNAME (STRING(72) VALUE WORD; 
ITEGER VALUE A7ZP); 
COMMENT CONSTRUCT CELLS TO 4OLD CHARACTERS OF WORD 
: PQrARTING AT LOCATION P 702 4 LETTERS OR UNTIL A; 
EGT) 
INTEGER 3,Cc; 
PROCEDURE INSERT CHAR (STRING(1) VALUE LTR; 
INTEGER VALOZ A); 
COMMENT INSERT LTR IN CELL A; 
M(A):=(M(A) SHL 8) OR B3ITSTRING (DECODE(LTR)); 
3:=ALLOCATZ; 
C:=ALLOCATE: 
SETCAR (B,C) § 
H (C) i= 0; 
TE (Az2) < 5 THE 
BEGIN 
SETCDR (3, NTL) j 
FOR I:=P oNTrIL (ac DO INSERT _CHAR (WIRD (T11) ,C) § 
POR Ii=1 ONTIL (4+P-a) DO M(C)T=4H(C)SaL 38; 
Sno 
BEGIN 
FOR 322? UNTIL (P+3) DO INSERT_CHAR(WORD(I{1),C); 
savor s 
SETCOR(B, BUILD _PNAME(¥ORD, 3,2) ) ; 
BY ae 
END SOUILD_PNAME; 
IP A <= 0 THEN ERROR (3); 
4 (2) = 4 (3) OR #FPPFPO000; 
5 TCDR(B ONS (PAE, CONS (SUTLD_PUAME(4ORD, A, 9) , NIL) )) 
2ND SOILD ATOM; 
INTEGER PROCEDURE HASA (INTZGER VALUZ ACCL; 
STRING (72 VALUE ACCIM) 
COMMENT COMPUTE AND 2ETURN HASHZD VALUE OF TOK IY 
ACCUMULATOR GIVEN ITOK L2NGTS; 
BEGIN 
INTEGER SUM,H; 
SOM:=0; 
FOR I:=0 ONTIL ACCL-1 DO SUM :=SUM+DECODE (ACCUM (Ii 1)); 
ae 26+5UM REM 128; 
END HASH; 
PROCEDURE BUS a (ZUTEGER TALUS £, 203 
COMMENT PUSH ATOM ONTO HASH TASLE2; 
SETCDR (I ,CONS (X,CDR (I) )) ; 
COMMENT=* x*x=x ERX XK XX eK KK RK KK (off Sf PS oS PS SSE DS £ StS Ee Ee Se tS et Se Se 
x * 
SfeerON vs PROPERTY LIST ACCESS * 
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PROCZDURE SCANNZR; 
COMMENT SCAN ENPUT STREAM FOR NEXT TOK, ASSIGN 
eqrateet AND BUILD AIOM HEADER CELL IF BFOUIRED: 
B 
PROCEDURE GNC; 
COMMENT GET NEXT CHARACTER PROM INPUT: 
BEGIN 
PROCE DURE BUMP IBP: - 
CCMMENT rapOr BUPPER POINTER: 
SEETE 
PROCEDURZ READBU?: 
COMMENT INPUTS NEXT RECORD, OUTPUTS 
ISTING, MONITORS COMMENTS AND FLAGS: 
BEGIN 
PROCEDURE SET_FLAGS (STRING(1) VALUE A) ; 
COMMENT ALTER THE AOPROPRIATE FLAG; 
TP A="$" TYEN PLAGS := aPLAGS 
ELSE IP A="L" THEN LIST:=-LIST 
ELSE IF A="T" THEN TRANS: SaTRANS 
2LS2 IF Asta! TYBY ARITH:=aARIPA 
RS eevhare (INVALID FLAG CALLED ::'", A): 
PROCEDURE GZTCARD; 
COMMENT READ AND LIST A DATA CARD; 
B2GIN 
READCARD (BUF) ; 
Pee (Ot) = 9S" THEN 
BEGIN 
SBT _?PLAGS (BUF (1}1)) ; 
IPF FLAGS THEN WRITS (SUF) ; 
END 
SLSz 
SEGIN INTPIELDSIZE := 4: 
LIND NO := LINE NO + 1: 
Tt? LIST THZN SRITE (LIVE VO," " 307); 
IVTPISLOSIZ=S := 18: 
Sa 
2ND GETCARD; 
Sarte (au Op ipa") OR (BUF (OF1)="E") DO 
54 2 ? =! OR (BUF(O}1)="¢s 
aEt ex aD: 
I3BP:=0 
SND RSADBUF; 
IBP:=IBP+1; 
TP IBP >= 30 THEN READBUP: 
SND BUMP_IBP; 
PPOCEDURE SKIP COMMENT j 
COMMENT SKIP JV2R COMMENTS IN INPUT; 
3EGIN 
BUGP TBP: 
@HILS BOF(IBP]1) == "'" DO BUMP_IBP; 
BUMP IBP; 
C:=BUP (I3?{ 1) 
END SKIP COMMENT; 
IP IBP>=80 THEN BUMP_IBP; 
Cz=B08 (7321 1); 
TP C="t THEN SKID COMMENT: 
30¢P IBP; 
NC:=BUF (IBP{ 1) 
2ND GNC; 
PROCEDURE LOOK UP; 
COMMENT DETERMIVE IP TOK HAS ALREADY SEEN 
STORED. IF NOT, CREATE ATOM HEADER CELL 
AND PNAME ATTRIBUTE-“VALU® PAIR. RETURN POINT?R 
TO ATOM HEADER CZLL; 


BEGIN 
ENTEGER ADDR; 
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. SECTION VIII: TRANSLATION x 
x ™ 
e+e oe PS SESS SL SS S tS 2 SS 2 2 2 2 2 2 2S 22 2 2S SSS SSS SSS SS SPS eS SS SS SS SE: D : 
INTEGER PROCZDGRE BUILDS; 
COMMENT L2PT PARENTHESIS ALREADY SEEN; 
BEGIN 
INTEGER T; 
SCANNER; 
IF TOK = wy" THEN NIL 
SLS& Ip ToK = "." THEN 
BEGIN SCANNZR; 
IP TOK = "(" THEN T i= BUILDS 
ELSE Tf := HEADER; 
SCANNER; 
IF TOK s= ")" THEY SY¥N_ZRR(7) ; 
END 
BLSE 
BEGIN 
IP (TYPE = IDENTIFIER) OR (TYPE = UMS) THEY 
T 3:= HEADER 
ZLSE IF TOK = _"(" THEN T := BUILDS 
ELSE SYN 222 (7) 3 
CONS (7, BUILDS) 
2ND 
END BUILDS; 
INTEGER PROCEDURE TPUNC; 
COMMENT TRANSLATE AN 4-SX2RESSION PUNCTION 
TYTO AN INTPRNAL S-2LDRESSTION; 
BEGIN 
INTEG2R PROCEDURE LABEL_?ONC; 
CCUMENT TRUNSTATS AvLAseL SONCTION: 
3 2GiN 
INTEGER FP, ?UNC; 
SCANNER; 
IF TOKA= "<" THEY STV_2RR (4); 
SCANNER; 
P? := HEADER; 
SCANNER; 
I? TOK -= "3" THEN SYN_2RR (4); 
SCANNER; 
FONC := TPONC; 
SCANNER; 
IF TOK >= ">" THEN SYN_SRR(S) ; 
CONS3 (LABEL ,PP, 2UNC) 
END LABZL_FUNC; 
INTESGZ2 PROCEDUPZE LAMBDA PUNC; 
COMMENT TRANSLATES A DTAMBDA FONCTION 
BEGIN 
INTZGER PROCEDURE FARLIST; 
BEGIN INTSG2R T; SCANNER; 
IF TOK = ">" THEN 
sae os 
ELSEIF (TOK="<n) OR (TOK="5") THIN 
BEGIN SCANNER; 
T 3:= COWS (HZADZ2,VARLIS®) ; 
END 
ZLSE S7N_ERR(10) ; 
END VARLIST; 
INTEGER VLIST, FORM: 
SCANNER; : 
I? TOKw="<" THEN SYN_ERR(S); 
ToISe eae List: 
SCANNER; 
IF TOK3=";" THEN SYN_22R (5); 
PORM := TPORM; i 
SCANNER; 
IP TOK -= ">" THEN SY¥¥_FRR(S); 
CONS 3 (LAMBDA, YETST, FORM} 
END LAMBDA FUNC; 
INTEGER PN; 
IP TYP = IDENTIPIZR THAN PN := HEADER 
SLSE I? Tok="5" THEY PN t= LAMBDA FUNC 
ELSZ IP TOK="9" THEN PN := LABEL _?ONC 
ELS2 S¥N_=ERR (6) ; = 
END TPUNC; 
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